26 research outputs found
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
Kernel-based stochastic collocation for the random two-phase Navier-Stokes equations
In this work, we apply stochastic collocation methods with radial kernel
basis functions for an uncertainty quantification of the random incompressible
two-phase Navier-Stokes equations. Our approach is non-intrusive and we use the
existing fluid dynamics solver NaSt3DGPF to solve the incompressible two-phase
Navier-Stokes equation for each given realization. We are able to empirically
show that the resulting kernel-based stochastic collocation is highly
competitive in this setting and even outperforms some other standard methods
On the algebraic construction of sparse multilevel approximations of elliptic tensor product problems
We consider the solution of elliptic problems on the tensor product of two physical domains as for example present in the approximation of the solution covariance of elliptic partial differential equations with random input. Previous sparse approximation approaches used a geometrically constructed multilevel hierarchy. Instead, we construct this hierarchy for a given discretized problem by means of the algebraic multigrid method. Thereby, we are able to apply the sparse grid combination technique to problems given on complex geometries and for discretizations arising from unstructured grids, which was not feasible before. Numerical results show that our algebraic construction exhibits the same convergence behaviour as the geometric construction, while being applicable even in black-box type PDE solvers
Ensemble Kalman filters for reliability estimation in perfusion inference
We consider the solution of inverse problems in dynamic contrast–enhanced imaging by means of Ensemble Kalman filters. Our quantity of interest is blood perfusion, i.e. blood flow rates in tissue. While existing approaches to compute blood perfusion parameters for given time series of radiological measurements mainly rely on deterministic, deconvolution–based methods, we aim at recovering probabilistic solution information for given noisy measurements. To this end, we model radiological image capturing as sequential data assimilation process and solve it by an Ensemble Kalman filter. Thereby, we recover deterministic results as ensemble–based mean and are able to compute reliability information such as probabilities for the perfusion to be in a given range. Our target application is the inference of blood perfusion parameters in the human brain. A numerical study shows promising results for artificial measurements generated by a Digital Perfusion Phantom
Analysis and parallelizationstrategies for Ruge-Stüben AMGon many-core processors
The Ruge-Stuben algebraic multigrid method (AMG) is an optimal-complexity black-box approach to solve linear systems arising in discretizations of e.g. elliptic PDEs. Recently, there has been a growing interest in parallelizing this method on many-core hardware, especially graphics processing units (GPUs). This type of hardware delivers high performance for highly parallel algorithms. In this work, we analyse convergence properties of recent AMG developments for many-core processors and propose to use more classical choices of AMG components for higher robustness. Based on these choices, we introduce many-core parallelization strategies for a robust hybrid many-core AMG. The strategies can be understood and applied without deep knowledge of a given many-core architecture. We use them to propose a new hybrid GPU implementation. The implementation is tested in an in-depth performance analysis, which outlines its good convergence properties and high performance in the solve phase
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors’ knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems
On the algebraic construction of sparse multilevel approximations of elliptic tensor product problems
We consider the solution of elliptic problems on the tensor product of two physical domains as e.g. present in the approximation of the solution covariance of elliptic partial differential equations with random input. Previous sparse approximation approaches used a geometrically constructed multilevel hierarchy. Instead, we construct this hierarchy for a given discretized problem by means of the algebraic multigrid method (AMG). Thereby, we are able to apply the sparse grid combination technique to problems given on complex geometries and for discretizations arising from unstructured grids, which was not feasible before. Numerical results show that our algebraic construction exhibits the same convergence behaviour as the geometric construction, while being applicable even in black-box type PDE solvers
Boosting quantum machine learning models with multi-level combination technique: Pople diagrams revisited
Inspired by Pople diagrams popular in quantum chemistry, we introduce a hierarchical scheme, based on the multilevel combination (C) technique, to combine various levels of approximations made when calculating molecular energies within quantum chemistry. When combined with quantum machine learning (QML) models, the resulting CQML model is a generalized unified recursive kernel ridge regression which exploits correlations implicitly encoded in training data comprised of multiple levels in multiple dimensions. Here, we have investigated up to three dimensions: Chemical space, basis set, and electron correlation treatment. Numerical results have been obtained for at- omization energies of a set of ∼7'000 organic molecules with up to 7 atoms (not counting hydrogens) containing CHONFClS, as well as for ∼6'000 constitutional isomers of C 7 H 10 O 2 . CQML learning curves for atomization energies suggest a dramatic reduction in necessary training samples calculated with the most accurate and costly method. In order to generate millisecond estimates of CCSD(T)/cc-pvdz atomization energies with prediction errors reaching chemical accuracy (∼1 kcal/mol), the CQML model requires only ∼100 training instances at CCSD(T)/cc-pvdz level, rather than thousands within conventional QML, while more training molecules are required at lower levels. Our results suggest a possibly favorable trade-off between various hierarchical approximations whose computational cost scales differently with electron number